ci: deterministic check for studio/frontend dep removals by danielhanchen · Pull Request #5478 · unslothai/unsloth

danielhanchen · 2026-05-16T05:06:35Z

Summary

Adds a deterministic CI gate that catches accidental breakage when dependencies are removed from studio/frontend/package.json.
The gate diffs the new manifest against the PR base, finds every package no longer declared, and confirms each one is either unused everywhere in the repo or still resolvable from the new dependency tree.
Backed by a 64-case test suite that covers common removals, adversarial / sneaky removal patterns, and direct classify-function unit cases.

This was motivated by the recent #5438 (drop unused next dep) and #5477 (drop 11 more unused deps). Both are safe today, but only because every removal was hand-verified. This change turns that verification into a repeatable check.

What the check catches

For every removed package, it greps the entire repo (every .ts/.tsx/.js/.jsx/.mjs/.cjs/.html/.htm/.css/.scss/.sass/.json/.jsonc/.md/.mdx/.py/.rs/.toml/.yml/.yaml/.sh/.ps1/.bat/Dockerfile* outside node_modules, dist, build, target, .git, __pycache__, .venv) for these patterns, then feeds an 8-line window into the classifier so multi-line statements are also caught:

import X from "pkg" / import { X } from "pkg/sub" (static imports, single or multi-line)
import type { X } from "pkg" (type-only imports)
import "pkg" (side-effect imports; CSS @import distinguished via lookbehind)
import("pkg") (dynamic imports)
export * from "pkg" / export { x } from "pkg" / export type { Foo } from "pkg" (re-exports, single or multi-line)
require("pkg") and require.resolve("pkg")
@import "pkg" (CSS imports)
<script src="...pkg"> and <link href="...pkg"> (HTML)
/// <reference types="pkg" /> (TypeScript triple-slash)
new URL("pkg/...", import.meta.url) patterns
CSS url(.../pkg/...) references
Backtick template literals containing the package path
JSDoc @import("pkg") references
Bare quoted-string occurrences in any JS-like file (catches plugin lists in vite.config, tsconfig paths, eslint plugin arrays, shadcn components.json, biome configs, etc.)

Word-boundary safeguards keep foo from matching foobar, lodash from matching lodash-es, and @radix-ui/react-label from matching @radix-ui/react-label-extra. Package names appearing only in plain-text comments or URL tails do not trip the detector.

How the install-graph reachability works

After collecting source references, the checker resolves whether the package would still install after the removal:

BFS from the new package.json's declared deps through the lockfile dep graph.
Only top-level node_modules/<name> paths count as importable from src/. Nested copies under another package are noted but flagged as not importable from bare specifiers.
Reachability is computed from the new package.json, not the lockfile's <root> map, so a stale lockfile cannot give a false OK-via-transitive verdict.

A pip install playwright reference in a Python file or workflow is filtered out so removing the npm playwright is reported correctly.

Hygiene warnings (non-fatal unless `--strict`)

package-lock.json's <root> dep map disagrees with package.json (lockfile drift, run npm install).
@types/<X> is declared but <X> is no longer declared anywhere.
An import statement in studio/frontend/src/ references a bare specifier that has no matching dep declared in any field of package.json.

Test suite (`tests/studio/test_frontend_dep_removal.py`)

Three suites, 64 cases total, all passing.

Suite 1: 24 end-to-end edge cases (subprocess-driven, full pipeline)

ID	Behaviour	Expected
C1	remove `next-themes` (2 src imports, no transitive parent)	FAIL
C2	remove `@xyflow/react` (root-only, src imports recipe-studio)	FAIL
C3	remove `katex` (kept top-level by `@streamdown/math`, `mermaid`, `rehype-katex`)	PASS
C4	remove `clsx` (kept by `streamdown`)	PASS
C5	remove `react` (kept by countless peers)	PASS
C6	remove `@radix-ui/react-slot` (kept by `radix-ui` + `@assistant-ui/react`)	PASS
C7	remove `zustand` (top-level via `@assistant-ui/react`; nested xyflow 4.x irrelevant)	PASS
C8	multi-remove with mixed safety	FAIL [3 pkgs]
C9	remove `@huggingface/hub` (5 src imports)	FAIL
C10	remove `tailwind-merge` (kept by `streamdown`)	PASS
C11	remove a name that never appeared in the base	PASS (no-op)
C12	move `@hugeicons/react` from deps to devDeps	PASS (not a removal)
C13	remove `@huggingface/hub` AND `@xyflow/react` together	FAIL [both]
C14	remove `dexie` (multiple src imports)	FAIL
C15	remove `motion` (20+ src imports)	FAIL
C16	remove `canvas-confetti` (used in confetti.tsx)	FAIL
C17	remove `recharts` (used in chart.tsx)	FAIL
C18	remove `js-yaml` (kept by `@eslint/eslintrc`; @types/js-yaml orphan warning)	PASS
C19	remove `node-forge` (used in providers-api.ts)	FAIL
C20	remove `@tauri-apps/api` (kept by all 5 `@tauri-apps` plugins)	PASS
C21	remove `mammoth`	FAIL
C22	remove `unpdf`	FAIL
C23	remove `remark-gfm` (kept by `streamdown`)	PASS
C24	remove `date-fns` (kept by `react-day-picker` and `@base-ui/react`)	PASS

Suite 2: 28 classify() unit cases (direct function call against hand-crafted snippets)

Covers each pattern shape: static import, side-effect import, dynamic import, require, css_import, html_script, html_link, re_export (4 variants: *, named, type, multi-line), template_literal, new_url, tsc_triple_slash, jsdoc_import, string_literal. False-positive guards: substring collision (foo vs foobar), exact-prefix collision (lodash vs lodash-es), scoped-pkg collision (@radix-ui/react-label vs ...-extra), plain-text comments, URL tails, Python files, markdown links.

Suite 3: 12 adversarial end-to-end cases (write synthetic files, run full script, clean up)

Each case drops a synthetic file under studio/frontend/src/__dep_check_adversarial__/, runs the script as a subprocess against a base that adds a synthetic package, and asserts the outcome. Then cleans up.

ID	Sneaky shape	Expected
A01	multi-line `import { ... } from "removed_pkg"`	FAIL
A02	`export * from "removed_pkg"`	FAIL
A03	`export { foo } from "removed_pkg"`	FAIL
A04	`export type { Foo } from "removed_pkg"`	FAIL
A05	similar-prefix collision (`pkg_e` vs `pkg_e_extra`)	PASS (no false positive)
A06	dynamic import	FAIL
A07	`new URL("pkg/...")`	FAIL
A08	string-concat dynamic import (genuinely unanalyzable)	PASS (acknowledged limit)
A09	usage only inside a JS comment	FAIL (conservative; err on safe side)
A10	Python file mentions pkg name	PASS (Python cannot import npm)
A11	markdown doc mentions pkg	PASS
A12	JSDoc `@import("removed_pkg")`	FAIL

CI integration

studio-frontend-ci.yml gets one new step on pull_request events:

- name: Dependency removal safety check
  if: github.event_name == 'pull_request'
  working-directory: ${{ github.workspace }}
  run: |
    python3 scripts/check_frontend_dep_removal.py --base "origin/${{ github.base_ref }}"
    python3 tests/studio/test_frontend_dep_removal.py

The script reads package-lock.json directly, so it completes in well under a second and runs before npm ci. The full test suite (64 cases) runs end-to-end in a few seconds.

Manual usage

python3 scripts/check_frontend_dep_removal.py
python3 scripts/check_frontend_dep_removal.py --base HEAD~1
python3 scripts/check_frontend_dep_removal.py --base origin/main --strict
python3 scripts/check_frontend_dep_removal.py \
  --base-pkg /tmp/old_package.json --head-pkg studio/frontend/package.json

Exit codes: 0 clean, 1 one or more removed deps still referenced and not reachable, 2 invocation error.

Test plan

Script run from a clean branch (no removals) returns [OK] no dependencies removed.
Script run against studio/frontend: drop unused dependencies, move type pkg to devDeps #5477's branch reports all 11 removals as OK / OK-via-transitive with reachability paths.
All 24 edge cases pass.
All 28 classify() unit cases pass.
All 12 adversarial end-to-end cases pass.
End-to-end FAIL trace for next-themes returns exit 1 with file:line citations of every offending import.
CI: studio-frontend-ci runs the new step on this PR (and reports no findings since this PR removes nothing).

Adds a CI gate that catches the common foot-gun: a dep dropped from studio/frontend/package.json that something in src/ still imports. scripts/check_frontend_dep_removal.py Diffs package.json against a git base ref, collects every package no longer declared, and for each one: 1. Greps the entire repo for any usage pattern (static / dynamic / side-effect imports, require, CSS @import, HTML script/link src, new URL(), triple-slash references, template literals, bare quoted strings in JS-like files). 2. Resolves whether the package would still install by BFS'ing the dep graph in the new lockfile starting from the new package.json's declared deps (so a stale lockfile does not give false OK-via-transitive results). 3. Distinguishes top-level node_modules/<name> from nested copies under other packages. Bare src/ imports only resolve to the top-level path. 4. Pip-installed playwright references are filtered, so removing the npm playwright (CI uses the pip one) is reported correctly. Additional hygiene checks (warnings, fail with --strict): - lockfile <root> dep map matches package.json (catches drift). - @types/X is not orphaned when X is no longer declared. - No src/ import points at a package not declared in any field. tests/studio/test_frontend_dep_removal.py 24 deterministic cases. Each patches a copy of the head package.json, runs the script, and asserts (exit status, reported FAIL list). Covers: - Genuinely-breaking removals: next-themes, @xyflow/react, @huggingface/hub, dexie, motion, canvas-confetti, recharts, node-forge, mammoth, unpdf. - Safe-via-transitive removals: katex, clsx, react, @radix-ui/react-slot, zustand, tailwind-merge, remark-gfm, date-fns, js-yaml, @tauri-apps/api. - Mixed multi-removal failing on the unsafe entries only. - Non-existent / not-in-base names (no-op). - Move from deps to devDeps (not a removal). .github/workflows/studio-frontend-ci.yml Runs the checker on pull_request events against origin/${{ github.base_ref }}, plus the edge-case suite.

classify() now catches sneaky shapes that an earlier line-only scan would miss: - multi-line `import { a, b } from "pkg"` and the same shape for `export { ... } from "pkg"` / `export * from "pkg"` / `export type ... from "pkg"`. - JSDoc `@import("pkg")` references. - Word-boundary fix so `foo` no longer matches `foobar` (subpath gate: after the package name we require closing quote or `/`). - Negative-lookbehind on `(?<!@)\bimport\b` so CSS `@import "X"` is classified as css_import, not side_effect_import. find_usage() now feeds an 8-line window (4 above / 4 below the grep hit) into classify() so multi-line import statements are picked up even though the initial grep is line-based. tests/studio/test_frontend_dep_removal.py now exercises three suites: - 24 edge cases: subprocess-driven, full-pipeline. - 28 classify() unit cases: direct function call against hand-crafted snippets. Covers static / side-effect / dynamic / require / css_import / html_script / html_link / re_export (4 variants) / template_literal / new_url / tsc_triple_slash / jsdoc_import / string_literal, plus false-positive guards (substring collision, plain-text comments, URL path tails, Python files, markdown). - 12 adversarial cases: write synthetic files under studio/frontend/src/__dep_check_adversarial__/, run the full script, then clean up. Confirms multi-line imports, re-exports, JSDoc @import, new URL, dynamic imports all FAIL when the underlying package is removed. Current total: 64 / 64 cases pass.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 231d4f12cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T05:08:45Z

+        working-directory: ${{ github.workspace }}
+        run: |
+          python3 scripts/check_frontend_dep_removal.py \
+            --base "origin/${{ github.base_ref }}"


Fetch the PR base before diffing dependencies

On pull_request runs, the checkout at the top of this job uses actions/checkout's default shallow fetch, which only fetches the triggering ref/commit, so origin/${{ github.base_ref }} is not guaranteed to exist in the workspace. In that common PR context, this command makes git show origin/main:studio/frontend/package.json fail inside read_pkg_at(), causing the new safety check to exit 2 and block frontend CI before typecheck/build even run; fetch the base ref/history or use github.event.pull_request.base.sha after fetching it.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces a new utility script, scripts/check_frontend_dep_removal.py, and a corresponding test suite to prevent breaking changes when removing npm dependencies from the frontend. The script verifies that removed packages are no longer referenced in the codebase and are not required transitively. Review feedback identifies duplicated logic in the main function regarding lockfile handling, suggests explicitly specifying UTF-8 encoding for file reads to ensure cross-platform consistency, and recommends removing the unused lock_resolvable function.

gemini-code-assist · 2026-05-16T05:10:33Z

+    for file, lineno, content in rows:
+        if pkg == "playwright" and PIP_PLAYWRIGHT.search(content):
+            continue
+        # Try the single-line classify first.


This block contains duplicated code. The line head_lock = read_pkg_file(Path(args.head_lock)) is executed regardless of the condition. Additionally, the --base-lock argument is defined but its value is never used, which can be misleading.

You can simplify this to a single line. If the --base-lock argument is indeed unnecessary, consider removing it from the argument parser as well.

head_lock = read_pkg_file(Path(args.head_lock))

References

When a condition or calculated value is used across multiple conditional branches, compute it once and reuse the result to ensure consistency and improve maintainability.

gemini-code-assist · 2026-05-16T05:10:33Z

+    res = subprocess.run(
+        cmd,
+        cwd = cwd or REPO_ROOT,
+        stdout = subprocess.PIPE,


It's a good practice to explicitly specify the encoding when reading text files to avoid issues on different platforms. Since package.json and package-lock.json files are expected to be UTF-8, please add encoding="utf-8".

Suggested change

res = subprocess.run(

cmd,

cwd = cwd or REPO_ROOT,

stdout = subprocess.PIPE,

def read_pkg_file(path: Path) -> dict:

if not path.exists():

return {}

return json.loads(path.read_text(encoding="utf-8"))

gemini-code-assist · 2026-05-16T05:10:33Z

+    if not out.strip():
+        return {}
+    return json.loads(out)
+
+
+def read_pkg_file(path: Path) -> dict:
+    if not path.exists():
+        return {}
+    return json.loads(path.read_text())
+
+


This function lock_resolvable appears to be unused. To improve maintainability, it's best to remove dead code.

Catches the last common false-negative: removing a package whose bin is only referenced through `package.json` scripts (e.g. dropping typescript while `"build": "tsc -b && vite build"` calls tsc). Cross-checked the patterns Vercel/Next.js, Vite, and TanStack use in their own manifests; the bin/scripts pairing is the one consumer-side pattern dep checkers commonly miss. How it works: - Build a bin-to-package map from each lockfile entry's `bin` field. The map is global so a stale lockfile still resolves bins from packages about to be pruned. - Tokenize each script value, splitting on `&&`, `||`, `;`, `|`. Strip env-var assignments and `npx / pnpx / yarn / pnpm / bunx` prefixes, plus `./node_modules/.bin/` and `node_modules/.bin/` path prefixes. Look up the leading token in the bin map. - Hits are reported as `script_bin` and feed the same reachability gate as source imports. A bin still installed transitively (e.g. vite via @vitejs/plugin-react peer) is OK-via-transitive; an orphaned bin is FAIL. Test additions: - 5 new edge cases: removing vite, typescript, eslint, @biomejs/biome, and (@biomejs/biome + @vitejs/plugin-react) together. Correctly flags @biomejs/biome and the combo as FAIL while vite / typescript / eslint are kept by peers. - 8 new classify() unit cases: TypeScript ambient `declare module`, namespace imports, combined default+named, default-as-named, re-export default (4 forms), `.then()` dynamic imports without await, and TypeScript `import()` in type position. Current total: 29 edge + 36 classify-unit + 12 adversarial = 77 / 77.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17705256c3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T05:21:00Z

+    if args.base_lock:
+        head_lock = read_pkg_file(Path(args.head_lock))
+    else:
+        head_lock = read_pkg_file(Path(args.head_lock))


Use --base-lock when the override is provided

The --base-lock flag is parsed and documented, but this branch reads args.head_lock in both cases, so providing --base-lock has no effect. That breaks the advertised CLI contract for offline/synthetic comparisons and can silently produce results from the wrong lockfile when users think they are overriding the base lock input.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-16T05:21:00Z

+          python3 scripts/check_frontend_dep_removal.py \
+            --base "origin/${{ github.base_ref }}"
+          python3 tests/studio/test_frontend_dep_removal.py


Include checker files in PR path triggers

This new CI step executes scripts/check_frontend_dep_removal.py and tests/studio/test_frontend_dep_removal.py, but the workflow's pull_request.paths filter only includes studio/frontend/** and the workflow file itself. As a result, PRs that modify either checker file can bypass this job entirely, allowing regressions in the new safety gate to merge without running its own CI.

Useful? React with 👍 / 👎.

After surveying package.json patterns in 10+ popular repos (React, Vue/Svelte/Astro/Next.js, Vite, Storybook, TanStack/Query, Tailwind, ESLint, TypeScript, Prettier, SvelteKit), several config fields in package.json itself can reference packages by string. My checker filtered all of package.json out of the string_literal fallback, so removing a package that is only referenced from one of these fields was a false negative. Now covered (new pkg_json_field kind): - overrides / resolutions / pnpm.overrides keys - pnpm.patchedDependencies keys - peerDependenciesMeta keys - prettier: "@my/prettier-config" string - eslintConfig.extends (string or array) - stylelint.extends / stylelint.plugins - babel.presets / babel.plugins - jest.preset / jest.setupFiles / jest.transform - commitlint.extends - renovate.extends - remarkConfig.plugins - any other tool config field whose strings/keys equal the pkg name or `pkg/subpath` False-positive guards (do not flag string values inside): - browserslist (browser queries) - keywords (free-form strings) - engines / engineStrict / packageManager / volta (version pins) - files / directories / publishConfig (paths) - workspaces (paths/globs) - main / module / browser / types / typings / exports / imports / bin / man (author-side fields) - scripts (already handled separately via scripts_bin_refs) - name / version / description / author / repository / homepage etc. Test additions: new PkgFieldCase suite with 19 cases covering each tool config field, subpath references, and the 5 false-positive guards. Combined with the existing 29 edge / 36 classify / 12 adversarial cases, the suite is 96 / 96.

Adds an opt-in dead-dep enumeration to the existing safety check. Iterates every package declared in studio/frontend/package.json (all four dep fields combined) and reports each as one of: used at least one detected reference -- in src/, a config file, package.json scripts (bin), a package.json tool-config field (overrides / prettier / eslintConfig / stylelint / babel / jest / commitlint / renovate / etc.), or tsconfig.compilerOptions.types unused no detected reference anywhere type_pkg_kept @types/X where X is still declared (or X = node, always implicit) type_pkg_orphan @types/X where X is no longer declared -- candidate for removal alongside X Wiring: - New CLI flag `--enumerate-dead` (off by default). - CI workflow now passes `--enumerate-dead` so the report shows on every PR run; the report is informational unless `--strict` is also set. - With `--strict`, unused / type_pkg_orphan entries fail the run. Tests: - 5 new EnumCase scenarios: E01 fake dep with no usage -> reported unused E02 fake dep imported by a synthetic src file -> reported used E03 fake dep referenced only in overrides -> reported used E04 @types/X paired with X (also imported) -> kept E05 @types/X without X -> orphan Running the new flag against the current main reproduces exactly the 11 deps PR #5477 removed, validating the heuristic end to end. Current total: 29 edge + 36 classify + 12 adversarial + 19 pkg-json field + 5 enumeration = 101 / 101.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2660f408ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T05:34:48Z

+    if not path.exists():
+        return {}


Treat missing head lockfile as invocation error

read_pkg_file() returns {} when the --head-lock path is missing, but main() never validates that case before proceeding. That makes the checker silently run without lock metadata and can report false PASS results for removals that are only referenced through npm script bins (for example, removing @biomejs/biome with a bad --head-lock path returns 0 even though biome scripts break). Since the CLI contract says missing files should be an invocation error, this should exit 2 instead of continuing with an empty lock graph.

Useful? React with 👍 / 👎.

actions/checkout uses fetch-depth: 1 by default, so when the dependency removal check ran `git show origin/main:.../package.json` the ref wasn't available locally and the script exited 2 with "could not read base package.json at origin/main:...". Fetch the single base commit before invoking the check so the git-show lookup resolves. --depth=1 keeps the extra fetch cheap.

Five issues flagged across gemini and codex: * --base-lock argparse arg was defined and advertised in the docstring, but main() always read args.head_lock in both branches -- the flag did nothing. Dropped the dead arg and the misleading docstring line; the lockfile-reachability analysis only needs the head lockfile. * lock_resolvable() was defined but never called. Removed. * read_pkg_file() did not specify an encoding for read_text(). Added encoding="utf-8" for cross-platform stability. * read_pkg_file() returned {} when the path did not exist, so a bad --head-lock value silently bypassed the reachability checks (false PASS for removals that resolve through npm script bins). main() now exits 2 with a clear message when the head lockfile is missing, matching the existing behavior for the head pkg. * studio-frontend-ci.yml pull_request paths filter only matched studio/frontend/** and the workflow file, so PRs that modified the checker script or its test could skip this job. Added both files to the trigger.

danielhanchen · 2026-05-16T09:25:58Z

Addressing the codex + gemini review:

Source	Issue	Resolution
codex P1 (resolved)	base ref fetch missing	`340f16c` — `git fetch --no-tags --depth=1 origin "${{ github.base_ref }}"` before the check
codex P2, gemini HIGH	`--base-lock` defined but unused; duplicated `head_lock = read_pkg_file(...)`	`bdf6f84` — dropped the dead arg, simplified to one read, updated docstring
gemini MEDIUM	missing `encoding="utf-8"` in `read_pkg_file`	`bdf6f84` — added
gemini MEDIUM	`lock_resolvable` is dead code	`bdf6f84` — removed
codex P2	`read_pkg_file` returns `{}` for missing path, silent false PASS	`bdf6f84` — main() now exits 2 when `--head-lock` path is missing, matching the existing behavior for `--head-pkg`
codex P2	`pull_request.paths` filter doesn't include checker script + test	`bdf6f84` — added `scripts/check_frontend_dep_removal.py` and `tests/studio/test_frontend_dep_removal.py` to the trigger

python3 tests/studio/test_frontend_dep_removal.py still passes 101/101 against the trimmed code path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bdf6f84844

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T09:28:07Z

+    print()
+
+    reachable_paths = reachable_from_head(head_pkg, head_lock) if head_lock else set()
+    bin_to_pkg = build_bin_to_pkg(head_lock) if head_lock else {}


Track script bins from the base lockfile

When a PR removes a package that is used only through an npm script and updates package-lock.json at the same time, building bin_to_pkg from the head lockfile drops the removed package's bin metadata before scripts_bin_refs() runs. For example, removing @biomejs/biome from both package.json and the lockfile leaves biome:check/biome:fix pointing at a missing binary, but this checker reports [OK] @biomejs/biome because the head lock no longer contains the biome -> @biomejs/biome mapping. This lets the new safety gate miss broken npm scripts for properly updated removals.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-16T09:28:28Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new script, check_frontend_dep_removal.py, along with a comprehensive test suite to prevent breaking changes when removing npm dependencies. The script identifies removed packages by diffing package.json and verifies they are no longer referenced in the codebase or remain resolvable transitively. Review feedback suggests several improvements to the script's robustness and efficiency, including expanding the import detection regex to cover dynamic imports and require calls, updating the list of Node.js builtins and internal aliases to prevent false positives, and optimizing the repository scanning process to avoid redundant file system traversals.

gemini-code-assist · 2026-05-16T09:32:05Z

+    for f in DEP_FIELDS:
+        decl.update((head_pkg.get(f) or {}).keys())
+    # Also: anything tsconfig path-aliases (just '@/...' here) is internal
+    pattern = r"(?:from|import)\s+['\"]([^'\"./][^'\"]*)['\"]"


The regex pattern used for the reverse check (finding imports without declarations) only identifies static import and from statements. It misses dynamic import() calls and require() statements, which are correctly handled in the primary removal check. This inconsistency might lead to missing undeclared dependencies that are only used dynamically.

Suggested change

pattern = r"(?:from|import)\s+['\"]([^'\"./][^'\"]*)['\"]"

pattern = r"(?:from|import|require)\s*\(?\s*['\"]([^'\"./][^'\"]*)['\"]"

gemini-code-assist · 2026-05-16T09:32:05Z

+                continue
+            if pkg_name in {


The internal alias check only accounts for the exact string "@". Most TypeScript configurations in this project likely use "@/*" for path mapping (as hinted by the comment), resulting in specifiers like "@/components/...". These will currently trigger false positive "missing dependency" warnings because they don't match the exact "@" check.

# Internal aliases like '@/foo' or starts with builtin names if pkg_name == "@" or pkg_name.startswith("@/"): continue

gemini-code-assist · 2026-05-16T09:32:05Z

+            if pkg_name in {
+                "node:fs",
+                "node:path",
+                "fs",
+                "path",
+                "url",
+                "stream",
+                "crypto",
+                "buffer",
+                "util",
+                "events",
+                "child_process",
+            }:
+                continue


The list of Node.js builtin modules is incomplete. Common modules such as os, http, https, and zlib are missing. Additionally, the check does not account for the node: protocol prefix, which is the modern standard for importing builtins. This can lead to false positive warnings during the hygiene check.

if pkg_name in { "assert", "async_hooks", "buffer", "child_process", "cluster", "console", "constants", "crypto", "dgram", "diagnostics_channel", "dns", "domain", "events", "fs", "http", "http2", "https", "inspector", "module", "net", "os", "path", "perf_hooks", "process", "punycode", "querystring", "readline", "repl", "stream", "string_decoder", "timers", "tls", "trace_events", "tty", "url", "util", "v8", "vm", "wasi", "worker_threads", "zlib", } or pkg_name.startswith("node:"): continue

gemini-code-assist · 2026-05-16T09:32:05Z

+
+    failures: list[tuple[str, list[Hit]]] = []
+    for name in removed:
+        hits = find_usage(name)


Executing a full repository grep for every removed package is inefficient. If multiple dependencies are removed, the entire codebase is scanned repeatedly. This scales poorly, especially when using the --enumerate-dead flag which triggers a scan for every declared dependency (currently ~67). Batching the search into a single grep execution using a combined regex pattern for all target packages would significantly improve performance.

References

To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.

To improve efficiency, avoid redundant file traversals.

Eight P1s and three P2s surfaced across 10 codex reviewers; this commit addresses all of them. P1s: 1. Workflow refspec. `git fetch --depth=1 origin <base_ref>` may only create FETCH_HEAD in shallow PR checkouts; the checker then dies with `fatal: invalid object name 'origin/main'`. Use the explicit refspec `<base>:refs/remotes/origin/<base>` so origin/<base> is reliably created. 2. `_deps_of()` was counting optional peer dependencies as reachable. npm only installs an optional peer when another package declares the same dep, so for "is this removed package still in the tree" they cannot keep it alive on their own. Skip entries marked `optional: true` in `peerDependenciesMeta`. 3. JS-syntactic classifiers (static_import, side_effect_import, dynamic_import, require, re_export, jsdoc_import, template_literal, tsc_triple_slash, new_url) now gate on file extension. Previously only the final string-literal fallback was gated, so a JS-shaped string inside a Python fixture or a Markdown code fence triggered a false FAIL. Added U37-U40 covering .py / .md / .sh / .yml. 4. HTML `<script src=>` and `<link href=>` patterns now respect a package-name boundary so `/node_modules/foo-extra/...` is not treated as a usage of `foo`. Added U41-U43. 5. New `find_command_usage()` detects CLI invocations in .sh / .yml / .yaml / .ps1 / .bat / Dockerfile* (npx pkg, bunx pkg, pnpm exec pkg, yarn dlx pkg, or a bare pkg --flag). Also covers scoped CLI packages exposed by their unscoped tail (@biomejs/biome -> biome). 6. `build_bin_to_pkg(head_lock)` was losing the bin -> package map for packages the PR correctly removed from the lockfile, so `scripts.biome:check` no longer flagged when @biomejs/biome was being dropped. Now also read the base lockfile (via `git show` or the new `--base-lock` override) and layer its bin map on top for any package in the removed set. 7. `--strict` now runs hygiene checks (lockfile sync, @types orphans, undeclared imports, dead-deps) on the no-removal path too. Previously the early return at "[OK] no dependencies removed" skipped them, so `--strict` silently passed on a tree with uncommitted lockfile drift or unused deps. 8. Removed `@types/X` packages are now matched against the runtime target name `X`: `/// <reference types="X" />`, tsconfig compilerOptions.types entries, AND runtime `import "X"` shapes. Handles the npm scope encoding (`@types/foo__bar` -> `@foo/bar`). P2s: 9. CSS `url(...)` now accepts both quoted and unquoted forms (added U44-U45). The previous regex required `/{pkg}/` after a slash, missing bare-package urls like `url(katex/fonts/x.woff2)`. 10. `find_imports_without_decl()` now covers all static-import shapes: `import "pkg"`, `import Foo from "pkg"`, `import { Foo } from "pkg"`, `import type { Foo } from "pkg"`, `await import("pkg")`, `require("pkg")`. 11. (Same as #8.) Removed `@types/X` is also linked to runtime imports of `X`, not just type-only references. Test suite expanded from 101 to 110 cases; all pass. Real-world enumerate-dead still flags the same 11 unused packages on studio/dep-removal-safety-check (matches PR 5477's removal set).

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-16T10:09:50Z

Addressing the 10x reviewer run (commit d0a5d5a). All 8 P1s and 3 P2s land in one focused commit; test suite expanded from 101 to 110 cases, all green.

#	Pri	Finding	Fix
1	P1	workflow `git fetch --depth=1 origin <base>` may only update FETCH_HEAD	explicit refspec `<base>:refs/remotes/origin/<base>`
2	P1	`_deps_of()` counted optional peers as reachable	skip entries marked `optional: true` in `peerDependenciesMeta`
3	P1	JS import classifiers ran on Python/Markdown/shell/YAML	gated all JS-syntactic patterns on `SCRIPT_LIKE_EXT` (added U37-U40)
4	P1	HTML `<script>` / `<link>` matched `/node_modules/foo-extra/` as usage of `foo`	added path-segment boundary `(?:/[^'\"#?]*)?(?=['\"#?])` (U41-U43)
5	P1	`npx pkg` / `bunx pkg` / `pnpm exec pkg` in shell/yml/ps1/Dockerfile ignored	new `find_command_usage()`, includes scoped-CLI tail bin names (`@biomejs/biome` -> `biome`)
6	P1	`build_bin_to_pkg(head_lock)` lost bin->pkg map for packages the PR correctly removed	layer `build_bin_to_pkg(base_lock)` on top for removed packages; new `--base-lock` override flag (re-introduced and now actually wired)
7	P1	`--strict` skipped hygiene checks on no-removal path	hygiene runs unconditionally before the early no-removal return
8	P1	removed `@types/X` not matched against runtime `X`	`find_types_runtime_usage()` checks `import "X"`, `/// <reference types="X" />`, and `compilerOptions.types: ["X"]`, with npm scope decoding (`@types/foo__bar` -> `@foo/bar`)
9	P2	CSS `url(...)` missed unquoted bare-package paths	extended regex (U44-U45)
10	P2	`find_imports_without_decl()` only matched `import "pkg"`, not `import Foo from "pkg"`	new pattern covers all static-import shapes plus `require()` and `await import()`
11	P2	(subsumed by #8)	same fix

Local verification:

python3 tests/studio/test_frontend_dep_removal.py
  29/29 edge cases pass
  45/45 classify-unit cases pass
  12/12 adversarial cases pass
  19/19 package.json-field cases pass
  5/5 enumeration cases pass

python3 scripts/check_frontend_dep_removal.py --base origin/main --enumerate-dead
  [OK] no dependencies removed
  unused (11): @assistant-ui/react-markdown, @assistant-ui/react-streamdown,
               @langchain/core, @radix-ui/react-{checkbox,label,select,separator},
               @streamdown/cjk, date-fns, playwright, remark-gfm

Real-world enumeration still flags the exact set PR #5477 was originally dropping, so the new gates don't introduce false positives on this repo.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d1b21c0ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T10:13:47Z

+            if first in {"npx", "pnpx", "yarn", "pnpm", "bunx"} and idx + 1 < len(
+                words
+            ):
+                idx += 1
+                first = words[idx]


Parse pnpm exec and yarn dlx script wrappers correctly

Update the wrapper parsing in scripts_bin_refs() to skip subcommands before resolving the binary name. Right now the code advances only one token for pnpm/yarn, so pnpm exec biome check is interpreted as binary exec and yarn dlx eslint as binary dlx, which never map back to the removed package. This creates a false PASS when a dependency is only referenced via those script forms. (For example, pnpm help exec shows usage pnpm ... exec <command>, so the executable token is after exec, not immediately after pnpm.)

Useful? React with 👍 / 👎.

Three blockers from the parallel Opus review batch: 1. scripts_bin_refs ignored every script that began with a wrapper. The original "first non-env token wins" heuristic credited cross-env / dotenv / dotenvx / env-cmd as the bin, so a script like `cross-env CI=1 biome check` left @biomejs/biome looking unused. Rewrote into _next_real_bin(), which peels env prefixes, the leading package-manager runner (npx / pnpx / bunx / pnpm exec / yarn dlx), and the known wrapper bins (with --/-flag-arg handling) before returning the real CLI. shlex tokenization preserves quoted env values like `FOO="a b"`. 2. enumerate_dep_usage skipped find_command_usage. The non-enumerate path already credited deps used only from CI / Dockerfile / shell scripts, but `--enumerate-dead` did not, so packages referenced only from a workflow were silently listed as dead. Added the same call (gated against @types/* to avoid the unscoped-tail false positive). 3. classify multi-line window was ±4 lines. Prettier formats long named-import lists one identifier per line, so a 20-import block pushed the `import` keyword out of the window and the dep dropped to the string-literal fallback (or worse, was missed entirely). Widened to ±25 -- still bounded enough to keep false-positives negligible, wide enough for the realistic Prettier ceiling. Tests: added 10 _next_real_bin unit cases + 4 scripts_bin_refs end-to-end cases (W01-W10 + I01-I04) and a 22-identifier multi-line import adversarial case (A13). Full suite: 125/125.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 418b63b7cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T11:11:34Z

+        # CLI invocations in shell scripts / workflows / Dockerfiles.
+        hits.extend(find_command_usage(name))
+        # @types/X is "used" if X is referenced as a type or as a


Skip command-bin detection for removed @types packages

The removal loop always calls find_command_usage(name) even when name is an @types/* package, but find_command_usage() derives candidate binaries from the package tail (for @types/npm, it scans for npm). That makes unrelated workflow/shell prose or commands containing that word count as a usage and can incorrectly fail safe removals of type-only deps that are not runtime CLIs. In practice this produces false FAIL results for synthetic removals like @types/npm, so dependency cleanup PRs can be blocked even when no TypeScript/runtime reference exists.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-16T11:12:45Z

Addressing the 4x Opus reviewer batch (commit 6b70e72, pre-commit.ci formatter pass on top in 418b63b).

Per the earlier direction, P1s + P2s only.

#	Pri	Finding	Fix
1	P2	`scripts_bin_refs` only looked at the first non-env token. A script like `cross-env CI=1 biome check` credited `cross-env` instead of `biome`, so a removed `@biomejs/biome` looked unused. `dotenv` / `dotenvx` / `env-cmd` had the same hole, and quoted env values (`FOO="a b" biome`) broke the naive split.	Rewrote as `_next_real_bin()`: peels env-prefix tokens (env-prefix run), the leading package-manager runner (`npx` / `pnpx` / `bunx` / `pnpm exec` / `yarn dlx`), and the known wrapper bins (with `--` and `-flag <arg>` handling). `shlex.split` preserves quoted env values.
2	P2	`enumerate_dep_usage` did not call `find_command_usage`, while the non-enumerate path did. A dep referenced only from a workflow / Dockerfile / shell script was silently listed as dead under `--enumerate-dead`.	Mirrored the call (gated against `@types/*` to avoid the unscoped-tail false positive: a removed `@types/foo` should not match invocations of `foo`).
3	P2	Multi-line classify window was `±4` lines. Prettier formats a 20+ named-import block one identifier per line, so `import` and `from "pkg"` were >4 lines apart and the dep dropped to the string-literal fallback or was missed.	Widened to `±25`. Still bounded enough that false-positives are negligible, wide enough for the realistic Prettier ceiling.

Tests: 10 _next_real_bin unit cases + 4 scripts_bin_refs end-to-end cases (W01-W10 + I01-I04) and a 22-identifier multi-line import adversarial case (A13). Full local suite: 125/125.

The three P3 comment-accuracy items about Bun lockfile migration wording on #5479 were left for a follow-up — the defensible policy ("skip Bun when no bun.lock is committed so a release install never silently generates one") is correct even if local Bun 1.3.11 does migrate. That work merged into the bun-removal commit on #5479 (the entire branch is now gone, so the wording becomes moot).

danielhanchen added 2 commits May 16, 2026 04:59

danielhanchen requested a review from rolandtannous as a code owner May 16, 2026 05:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

231d4f1

for more information, see https://pre-commit.ci